Data Sets

In this report we analyse origin-destination flux matrices based on the furthest extent definition. We fit each model to the total BBC mobility data set (\(\Omega^{T}\)) and three stratifications by employment status (\(\Omega^{U}\),\(\Omega^{Ed}\), \(\Omega^{Em}\), \(\Omega^{N}\)), age of user (\(\Omega^{U}\),\(\Omega^{18-30}\), \(\Omega^{30-60}\), \(\Omega^{60-100}\)) and member nation of the UK (\(\Omega^{E}\),\(\Omega^{W}\), \(\Omega^{S}\), \(\Omega^{NI}\)).

We compare the estimated mobility models to estimates from the 2011 census commuting flow data for England (\(\Omega^{CE}\)), Wales (\(\Omega^{CW}\)), Scotland (\(\Omega^{CS}\)) and Northern Ireland (\(\Omega^{CNI}\)).

We estimate posterior distributions for each model using hamiltonian MCMC (as implemented by the Stan package http://mc-stan.org/). To assess model fit and provide a basis for model selection we use approximate leave-one-out cross validation as implemented in the loo package (doi:10.1007/s11222-016-9696-4).

The per capita probability of moving to a different LAD each day varies by category:

## # A tibble: 5 x 5
##   employment_cat     N movers p_move cat_prop
##   <chr>          <int>  <int>  <dbl>    <dbl>
## 1 Under 18        2955    914  0.309   0.0683
## 2 Education       3511   1390  0.396   0.0811
## 3 Employed       30500  17227  0.565   0.705 
## 4 NEET            6325   1998  0.316   0.146 
## 5 Total          43291  21529  0.497  NA
## # A tibble: 4 x 5
##   age_cat      N movers p_move cat_prop
##   <chr>    <int>  <int>  <dbl>    <dbl>
## 1 Under 18  2955    914  0.309   0.0683
## 2 18-30     9611   4859  0.506   0.222 
## 3 30-60    26009  14015  0.539   0.601 
## 4 60-100    4716   1741  0.369   0.109

but also by LAD.

## 
## Call:
## lm(formula = df$p_move ~ df$census_p_move)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.293007 -0.042909 -0.001684  0.040523  0.305933 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.15348    0.01071   14.33   <2e-16 ***
## df$census_p_move  0.84845    0.02419   35.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07553 on 389 degrees of freedom
## Multiple R-squared:  0.7597, Adjusted R-squared:  0.7591 
## F-statistic:  1230 on 1 and 389 DF,  p-value: < 2.2e-16
##                      2.5 %    97.5 %
## (Intercept)      0.1324184 0.1745389
## df$census_p_move 0.8008799 0.8960149

A linear regression of \(p_{BBC}\) against \(p_{C}\) demonstrates a strong linear relationship between the probability of moving as estimated from census data and the BBC total data set (Adjusted R-squared 0.71). The probability of moving to a different LAD per day is \(\sim\) 10% (7-12% 95% CI) greater in the BBC data set.

The coverage of the BBC mobility data set - with a median of 81 users per LAD (range 2-948) - means for the majority of LADS the raw data is too sparse to estimate movement rates for each strata of the BBC model. To address this, we estimate a generalised linear model (with logit link and random effects at the LAD level) to model the per LAD probability of moving and how this is adjusted for each strata (age or employment status). We estimate the random effects models using the lme4 package.

\[ p_{BBC} \sim~ group + (1 | LAD) \]

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(move, N - move) ~ group + (1 | LAD)
##    Data: age_df
## 
##      AIC      BIC   logLik deviance df.resid 
##   7135.2   7162.0  -3562.6   7125.2     1559 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5262 -0.6398  0.0000  0.6584  3.6525 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  LAD    (Intercept) 0.3608   0.6007  
## Number of obs: 1564, groups:  LAD, 391
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.76628    0.05179 -14.795  < 2e-16 ***
## group18-30   0.87269    0.04713  18.516  < 2e-16 ***
## group30-60   0.97002    0.04351  22.295  < 2e-16 ***
## group60-100  0.25761    0.05209   4.946 7.59e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) g18-30 g30-60
## group18-30  -0.702              
## group30-60  -0.760  0.842       
## group60-100 -0.636  0.698  0.757
## # A tibble: 3 x 2
##   group  or              
##   <chr>  <chr>           
## 1 18-30  2.39 (2.18,2.62)
## 2 30-60  2.64 (2.42,2.87)
## 3 60-100 1.29 (1.17,1.43)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(move, N - move) ~ group + (1 | LAD)
##    Data: emp_df
## 
##      AIC      BIC   logLik deviance df.resid 
##   6847.1   6873.8  -3418.5   6837.1     1559 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9221 -0.6283  0.0176  0.6700  3.8583 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  LAD    (Intercept) 0.3569   0.5974  
## Number of obs: 1564, groups:  LAD, 391
## 
## Fixed effects:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.764516   0.051693 -14.789   <2e-16 ***
## groupEducation  0.527434   0.055713   9.467   <2e-16 ***
## groupEmployed   1.081079   0.043292  24.972   <2e-16 ***
## groupNEET       0.004211   0.050099   0.084    0.933    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) grpEdc grpEmp
## groupEductn -0.593              
## groupEmplyd -0.765  0.718       
## groupNEET   -0.663  0.612  0.790
## # A tibble: 3 x 2
##   group     or              
##   <chr>     <chr>           
## 1 Education 1.69 (1.52,1.89)
## 2 Employed  2.95 (2.71,3.21)
## 3 NEET      1 (0.91,1.11)

Raw flux matrices comparing the census work-flow data set (A) to stratified BBC flux matrices.

PSIS diagnostic plots

The approximate leave-one-out cross validation (LOO) method uses pareto smoothed importance sampling (PSIS-LOO) to efficiently estimate the predictive accuracy of a model (expected log pointwise predictive density, \(\hat{elpd}\)) and as a basis for model comparison and selection. The estimated shape parameter \(\hat{k}\) can be used to judge the reliability of the estimate of \(\hat{elpd}\) for each data point (or in our case for each LAD corresponding to a row of \(\Omega_{ji}\)). The estimate of \(\hat{elpd}\) is considered reliable (quick convergence) for \(\hat{k} < 0.5\), performance may still be reliable for values of \(\hat{k}\) up to 0.7. Values of \(\hat{k} > 0.7\) suggest that the data points are highly influential to the estimated posterior and potentially introducing bias.

Impact of Highland LAD on CDE model

The highland LAD fails PSIS diagnostic checks with a value of \(\hat{k}>0.7\) (although the effect is smaller than for the next (frequency) OD matrix).

Comparison of a model fitted to the full 32 Scottish LADS to a reduced data set with Highlands removed (31 LADS) illustrates the systematic bias introduced on the distance scaling (\(\rho\) parameter). Although posterior distributions are overlapping we consider the size of the effect large enough to motivate removing the highland LAD from inference and for the purposes of model comparison.

Model comparison

The difference between \(\hat{elpd}\) for alternative models fitted to the same data provides a measure of their relative predictive accuracy.

##     model          elpd   model        elpd   model       elpd   model
## 1    ERad         0 (0)     CDO       0 (0)     CDO      0 (0)     CDO
## 2     CDO -11500 (1100)     CDP -23.3 (7.8)     CDP -16.1 (11)     CDE
## 3     CDP -12300 (1100)     CDE  -82.6 (17)      IO  -238 (51)     CDP
## 4      IO  -35400 (640)      IO   -278 (32)    ERad  -255 (45)      IO
## 5     CDE  -41400 (850)    ERad   -302 (26)     CDE  -350 (38)    ERad
## 6     Imp  -60900 (880)     Imp   -392 (31)     Imp  -465 (44)     Imp
## 7 Stoufer  -79300 (960) Stoufer   -486 (33) Stoufer  -644 (44) Stoufer
##          elpd
## 1       0 (0)
## 2 -3.32 (3.4)
## 3 -31.5 (9.7)
## 4 -49.9 (8.3)
## 5 -64.2 (7.6)
## 6   -78 (5.3)
## 7 -97.9 (6.7)
##     model         elpd   model        elpd   model        elpd   model
## 1     CDO        0 (0)     CDO       0 (0)     CDO       0 (0)     CDP
## 2    ERad    -364 (87)     CDP -6.95 (3.5)     CDP -12.2 (9.4)     CDO
## 3     CDE    -387 (42)     CDE -7.43 (2.9)     CDE  -16.3 (11)     CDE
## 4     CDP  -2590 (110)      IO -44.1 (8.1)      IO  -42.6 (17)      IO
## 5      IO  -3530 (120)    ERad   -55 (7.5)    ERad  -56.8 (20)    ERad
## 6     Imp  -5990 (170)     Imp  -86.8 (11)     Imp   -129 (23)     Imp
## 7 Stoufer -11200 (240) Stoufer   -167 (13) Stoufer   -244 (30) Stoufer
##          elpd
## 1       0 (0)
## 2   -1.75 (1)
## 3 -6.67 (2.3)
## 4  -6.7 (4.7)
## 5 -14.5 (5.4)
## 6 -29.5 (7.8)
## 7   -48 (7.5)
##     model         elpd   model        elpd   model        elpd   model
## 1     CDO        0 (0)     CDO       0 (0)     CDO       0 (0)     CDO
## 2     CDE    -408 (43)     CDE -33.4 (8.2)     CDE -31.6 (8.8)     CDE
## 3    ERad    -508 (91)    ERad  -90.8 (19)    ERad  -44.4 (23)    ERad
## 4     CDP  -3480 (160)     CDP   -213 (31)     CDP   -371 (41)     CDP
## 5      IO  -4600 (130)      IO   -517 (47)      IO   -629 (45)      IO
## 6     Imp  -7260 (180)     Imp   -919 (46)     Imp   -832 (48)     Imp
## 7 Stoufer -13600 (280) Stoufer  -1630 (70) Stoufer  -1940 (88) Stoufer
##           elpd   model       elpd
## 1        0 (0)     CDO      0 (0)
## 2    -397 (38)     CDE   -42 (11)
## 3    -544 (80)    ERad  -101 (28)
## 4  -2900 (150)     CDP  -633 (54)
## 5  -3980 (130)      IO  -995 (54)
## 6  -6200 (170)     Imp -1310 (62)
## 7 -12200 (270) Stoufer -2870 (99)
##     model        elpd   model        elpd   model         elpd   model
## 1     CDO       0 (0)     CDO       0 (0)     CDO        0 (0)     CDO
## 2     CDE -33.4 (8.2)     CDE   -143 (18)     CDE    -321 (35)     CDE
## 3    ERad  -90.8 (19)    ERad   -222 (39)    ERad    -419 (71)    ERad
## 4     CDP   -213 (31)     CDP  -1000 (78)     CDP  -2580 (140)     CDP
## 5      IO   -517 (47)      IO  -1590 (84)      IO  -3550 (110)      IO
## 6     Imp   -919 (46)     Imp  -2320 (93)     Imp  -5410 (150)     Imp
## 7 Stoufer  -1630 (70) Stoufer -5260 (180) Stoufer -10900 (240) Stoufer
##         elpd
## 1      0 (0)
## 2 -45.8 (11)
## 3  -109 (23)
## 4  -453 (45)
## 5  -829 (48)
## 6 -1100 (54)
## 7 -2570 (93)

Posterior Estimates (Favoured model - CDO)

The CDO model (Competing destinations with offset) is favoured for each data set except the census commuting workflow data within England for which the Extended Radiation model is favoured. For comparison between data sets we use the CDO model.

Posterior Estimates (CDE)

Posterior Estimates (CDP)

Posterior Estimates (ERad)

Posterior Estimates (Stoufer)

Posterior Estimates (IO)

Posterior Estimates (Imp)

Posterior Predictive Checks (CPC)

Imputed BBC flux matrices from CDO Model

## Warning in if (is.na(phi)) {: the condition has length > 1 and only the first
## element will be used
## Warning in if (is.na(phi)) {: the condition has length > 1 and only the first
## element will be used

Risk difference between imputed Census and BBC commuter flows